In today’s data-driven world, knowing how to uncover hidden relationships in your dataset is key. In this post, I’ll walk you through real R code that shows how to use the cor() function in R. Whether you’re a student or a researcher in Europe, these easy-to-follow examples will help you understand how to calculate correlation coefficients, build correlation matrices, and visualize data. Our examples use the classic mtcars dataset to make learning practical and fun. Read on for clear code examples, helpful explanations, and tips to improve your data analysis skills.

Read More

1. Basic Correlation with mtcars

The simplest way to start is by calculating the correlation between two variables. For example, loading the built-in mtcars dataset and running:

## [1] -0.7761684

This code computes the correlation coefficient between miles per gallon (mpg) and horsepower (hp). A value close to 1 means there is a strong positive correlation, while a value near -1 shows a strong negative correlation. This step is crucial because it lets you see if two variables move together. The result gives you an immediate insight into how your data behaves and helps you decide on further analysis steps.

2. Using dplyr for Correlation Analysis

To compute correlations for all numeric columns in your dataset, the dplyr package is very useful. By using:

##             mpg        cyl       disp         hp        drat         wt
## mpg   1.0000000 -0.8521620 -0.8475514 -0.7761684  0.68117191 -0.8676594
## cyl  -0.8521620  1.0000000  0.9020329  0.8324475 -0.69993811  0.7824958
## disp -0.8475514  0.9020329  1.0000000  0.7909486 -0.71021393  0.8879799
## hp   -0.7761684  0.8324475  0.7909486  1.0000000 -0.44875912  0.6587479
## drat  0.6811719 -0.6999381 -0.7102139 -0.4487591  1.00000000 -0.7124406
## wt   -0.8676594  0.7824958  0.8879799  0.6587479 -0.71244065  1.0000000
## qsec  0.4186840 -0.5912421 -0.4336979 -0.7082234  0.09120476 -0.1747159
## vs    0.6640389 -0.8108118 -0.7104159 -0.7230967  0.44027846 -0.5549157
## am    0.5998324 -0.5226070 -0.5912270 -0.2432043  0.71271113 -0.6924953
## gear  0.4802848 -0.4926866 -0.5555692 -0.1257043  0.69961013 -0.5832870
## carb -0.5509251  0.5269883  0.3949769  0.7498125 -0.09078980  0.4276059
##             qsec         vs          am       gear        carb
## mpg   0.41868403  0.6640389  0.59983243  0.4802848 -0.55092507
## cyl  -0.59124207 -0.8108118 -0.52260705 -0.4926866  0.52698829
## disp -0.43369788 -0.7104159 -0.59122704 -0.5555692  0.39497686
## hp   -0.70822339 -0.7230967 -0.24320426 -0.1257043  0.74981247
## drat  0.09120476  0.4402785  0.71271113  0.6996101 -0.09078980
## wt   -0.17471588 -0.5549157 -0.69249526 -0.5832870  0.42760594
## qsec  1.00000000  0.7445354 -0.22986086 -0.2126822 -0.65624923
## vs    0.74453544  1.0000000  0.16834512  0.2060233 -0.56960714
## am   -0.22986086  0.1683451  1.00000000  0.7940588  0.05753435
## gear -0.21268223  0.2060233  0.79405876  1.0000000  0.27407284
## carb -0.65624923 -0.5696071  0.05753435  0.2740728  1.00000000

you generate a full correlation matrix. This matrix displays the correlation coefficient for every pair of variables, making it easier to spot relationships in one view. The table format allows you to compare multiple variables at once. This approach is especially handy for large datasets, as it saves time and improves your workflow. It is a clear and simple method that even beginners can use to get a quick snapshot of their data.

3. Advanced Analysis with Hmisc and Custom Functions

For deeper insights, the Hmisc package is a powerful tool. The following code snippet installs and loads Hmisc, then defines a custom function to format the correlation matrix:

##           mpg|      cyl|     disp|       hp|     drat|       wt|     qsec|
## mpg   1.000  | -0.852**| -0.848**| -0.776**|  0.681**| -0.868**|  0.419* |
## cyl  -0.852**|  1.000  |  0.902**|  0.832**| -0.700**|  0.782**| -0.591**|
## disp -0.848**|  0.902**|  1.000  |  0.791**| -0.710**|  0.888**| -0.434* |
## hp   -0.776**|  0.832**|  0.791**|  1.000  | -0.449**|  0.659**| -0.708**|
## drat  0.681**| -0.700**| -0.710**| -0.449**|  1.000  | -0.712**|  0.091  |
## wt   -0.868**|  0.782**|  0.888**|  0.659**| -0.712**|  1.000  | -0.175  |
## qsec  0.419* | -0.591**| -0.434* | -0.708**|  0.091  | -0.175  |  1.000  |
## vs    0.664**| -0.811**| -0.710**| -0.723**|  0.440* | -0.555**|  0.745**|
## am    0.600**| -0.523**| -0.591**| -0.243  |  0.713**| -0.692**| -0.230  |
## gear  0.480**| -0.493**| -0.556**| -0.126  |  0.700**| -0.583**| -0.213  |
## carb -0.551**|  0.527**|  0.395* |  0.750**| -0.091  |  0.428* | -0.656**|
##            vs|       am|     gear|     carb|
## mpg   0.664**|  0.600**|  0.480**| -0.551**|
## cyl  -0.811**| -0.523**| -0.493**|  0.527**|
## disp -0.710**| -0.591**| -0.556**|  0.395* |
## hp   -0.723**| -0.243  | -0.126  |  0.750**|
## drat  0.440* |  0.713**|  0.700**| -0.091  |
## wt   -0.555**| -0.692**| -0.583**|  0.428* |
## qsec  0.745**| -0.230  | -0.213  | -0.656**|
## vs    1.000  |  0.168  |  0.206  | -0.570**|
## am    0.168  |  1.000  |  0.794**|  0.058  |
## gear  0.206  |  0.794**|  1.000  |  0.274  |
## carb -0.570**|  0.058  |  0.274  |  1.000  |

This function converts your data into a matrix, computes both the correlation coefficients and p-values, and then adds significance markers (stars) based on the p-values. This formatted output makes it easier to read and interpret the strength and significance of each relationship. It’s an excellent example of how custom code can tailor analysis to your needs.

4. Handling Missing Data and Using Different Methods

Real-world data often contains missing values. To manage this, you can use the “use” parameter in the cor() function:

##             mpg        cyl       disp         hp        drat         wt
## mpg   1.0000000 -0.8521620 -0.8475514 -0.7761684  0.68117191 -0.8676594
## cyl  -0.8521620  1.0000000  0.9020329  0.8324475 -0.69993811  0.7824958
## disp -0.8475514  0.9020329  1.0000000  0.7909486 -0.71021393  0.8879799
## hp   -0.7761684  0.8324475  0.7909486  1.0000000 -0.44875912  0.6587479
## drat  0.6811719 -0.6999381 -0.7102139 -0.4487591  1.00000000 -0.7124406
## wt   -0.8676594  0.7824958  0.8879799  0.6587479 -0.71244065  1.0000000
## qsec  0.4186840 -0.5912421 -0.4336979 -0.7082234  0.09120476 -0.1747159
## vs    0.6640389 -0.8108118 -0.7104159 -0.7230967  0.44027846 -0.5549157
## am    0.5998324 -0.5226070 -0.5912270 -0.2432043  0.71271113 -0.6924953
## gear  0.4802848 -0.4926866 -0.5555692 -0.1257043  0.69961013 -0.5832870
## carb -0.5509251  0.5269883  0.3949769  0.7498125 -0.09078980  0.4276059
##             qsec         vs          am       gear        carb
## mpg   0.41868403  0.6640389  0.59983243  0.4802848 -0.55092507
## cyl  -0.59124207 -0.8108118 -0.52260705 -0.4926866  0.52698829
## disp -0.43369788 -0.7104159 -0.59122704 -0.5555692  0.39497686
## hp   -0.70822339 -0.7230967 -0.24320426 -0.1257043  0.74981247
## drat  0.09120476  0.4402785  0.71271113  0.6996101 -0.09078980
## wt   -0.17471588 -0.5549157 -0.69249526 -0.5832870  0.42760594
## qsec  1.00000000  0.7445354 -0.22986086 -0.2126822 -0.65624923
## vs    0.74453544  1.0000000  0.16834512  0.2060233 -0.56960714
## am   -0.22986086  0.1683451  1.00000000  0.7940588  0.05753435
## gear -0.21268223  0.2060233  0.79405876  1.0000000  0.27407284
## carb -0.65624923 -0.5696071  0.05753435  0.2740728  1.00000000

This ensures that only complete cases are used in the analysis, which improves accuracy. Additionally, you can choose the method for computing correlations. By default, Pearson correlation is used, but you can switch to Spearman or Kendall’s method for different types of data:

## [1] -0.7761684
## [1] -0.8946646
## [1] -0.7428125

Each method has its strengths; choose the one that best suits your data’s nature. This flexibility makes the cor() function a versatile tool for many analysis needs.

5. Visualizing Correlation with corrplot

Visual tools help transform numbers into clear insights. With the corrplot package, you can turn a correlation matrix into an engaging graphic:

This code creates a colorful chart where each circle’s size and color indicate the strength and direction of the correlation. Blue may show a strong negative correlation, while red indicates a strong positive correlation. Visualization helps quickly spot trends and make comparisons across many variables. It simplifies complex data and makes presentations more appealing, making it a great tool for sharing findings with colleagues or in academic settings.

6. Creating Heatmaps with ggplot2

Another way to visualize correlations is by creating a heatmap using ggplot2 and reshape2. First, compute and melt the correlation matrix:

Then, build the heatmap:

This code creates a heatmap that uses color gradients to show correlations. It is an interactive way to see which variables are closely related, and helps you present data in a clear, engaging format.

7. Statistical Testing with cor.test

Testing the significance of your correlations adds trust to your findings. Use the cor.test() function to see if a correlation is statistically significant:

## 
##  Pearson's product-moment correlation
## 
## data:  mtcars$mpg and mtcars$hp
## t = -6.7424, df = 30, p-value = 1.788e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.8852686 -0.5860994
## sample estimates:
##        cor 
## -0.7761684

This function not only returns the correlation coefficient but also a p-value and confidence intervals. These details help you understand whether the observed relationship between mpg and hp is real or just due to chance. This step is important for researchers who need solid evidence to back up their analysis. The results provide a complete picture of the data relationships and ensure your conclusions are sound.

Final Thoughts and Call to Action

This post has shown you how to use the cor() function in R in many ways—from basic correlation calculations with the mtcars dataset to advanced custom functions, handling missing data, and creating dynamic visualizations. Each step is designed to give you clear insights and make your data analysis work easier. Now is the time to take your data skills to the next level. Book a free call with our experts at RStudioDatalab and enjoy a free exploratory data analysis along with a 10% discount on our services this month. If you’re not satisfied, we offer a full refund. Start transforming your data today and join the community of smart, data-driven students and researchers!