Before applying formal multivariate techniques, it is important to first explore the data using graphical methods. These visualisations provide an intuitive understanding of the structure of the data and help identify patterns, relationships, and potential issues such as outliers or unusual observations.

A key aspect of this exploratory step is assessing whether there are correlations between variables. Many multivariate methods, such as PCA, Exploratory Factor Analysis (EFA), and Discriminant Analysis, rely on the presence of relationships between variables to extract meaningful structure. Graphical tools allow us to quickly evaluate whether such relationships exist and whether these methods are appropriate.

Common graphical displays include scatterplots, pairwise scatterplot matrices, and correlation heatmaps. These plots help reveal trends, clusters, and dependencies between variables, and provide a first indication of the underlying dimensionality of the data.

In the following section, we will use these graphical techniques to explore the data before moving on to more formal methods for modelling and interpreting multivariate relationships.

Read data

First we need to read in our data into R.Throughtout this example we will use the wine data. These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines.

The attributes are:

Alcohol
Malic acid
Ash
Alcalinity of ash
Magnesium
Total phenols
Flavanoids
Nonflavanoid phenols
Proanthocyanins
Color intensity
Hue - OD280/OD315 of diluted wines
Proline

The wine data is in a .txt format, so to read in the data we can use the read.table() function in R.

wine <- read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data", sep=",")

colnames(wine) <- c("Cultivar","Alcohol","Malic.acid","Ash","Alcalinity.of.ash","Magnesium","Total.phenols","Flavanoids","Nonflavanoid phenols","Proanthocyanins","Color.intensity","Hue","OD280/OD315.of.diluted.wines","Proline")

dim(wine)
#> [1] 178  14

head(wine, 5)
#>   Cultivar Alcohol Malic.acid  Ash Alcalinity.of.ash Magnesium Total.phenols
#> 1        1   14.23       1.71 2.43              15.6       127          2.80
#> 2        1   13.20       1.78 2.14              11.2       100          2.65
#> 3        1   13.16       2.36 2.67              18.6       101          2.80
#> 4        1   14.37       1.95 2.50              16.8       113          3.85
#> 5        1   13.24       2.59 2.87              21.0       118          2.80
#>   Flavanoids Nonflavanoid phenols Proanthocyanins Color.intensity  Hue
#> 1       3.06                 0.28            2.29            5.64 1.04
#> 2       2.76                 0.26            1.28            4.38 1.05
#> 3       3.24                 0.30            2.81            5.68 1.03
#> 4       3.49                 0.24            2.18            7.80 0.86
#> 5       2.69                 0.39            1.82            4.32 1.04
#>   OD280/OD315.of.diluted.wines Proline
#> 1                         3.92    1065
#> 2                         3.40    1050
#> 3                         3.17    1185
#> 4                         3.45    1480
#> 5                         2.93     735

The wine dataset contains 178 observations of 14 variables, including the 13 measured quantities of chemicals and the variable Cultivar, which indicates the type of grape from which the wine was produced.

Pairise scatterplots

A simple and effective way to begin exploring relationships between variables is through scatterplots. These plots display the relationship between two variables at a time and allow us to visually assess patterns such as linear relationships, clusters, or outliers.

In R, scatterplots can be created using the ggplot2 package, which provides a flexible and consistent framework for data visualisation. The geom_point() function is used to create scatterplots by plotting individual observations as points.

library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.5.3
ggplot(data=wine, aes(x=Alcohol, y=Malic.acid)) + geom_point()
ggplot(data=wine, aes(x=Ash, y=Alcalinity.of.ash)) + geom_point()
ggplot(data=wine, aes(x=Hue, y=Color.intensity)) + geom_point()

By creating multiple pairwise scatterplots for different combinations of variables, we can begin to identify potential correlations and structures in the data. This provides an important first step before applying more formal multivariate techniques.

In addition to basic scatterplots, it is often useful to incorporate known group information into the visualisation. In this dataset, the variable Cultivar indicates the class membership of each observation. By colouring the points according to this variable, we can assess whether the groups show distinct patterns or separation in the data.

This can be done in ggplot2 by mapping the color aesthetic to the Cultivar variable:

ggplot(data=wine, aes(x=Hue, y=Color.intensity, col=Cultivar)) + geom_point()
ggplot(data=wine, aes(x=Alcohol, y=Malic.acid, col=Cultivar, size=Magnesium)) + geom_point()

Another useful way to visualise relationships between multiple variables is through the scatterplotMatrix() function from the car package. This function creates an enhanced scatterplot matrix with additional features compared to the base pairs() function.

library(car)
#> Loading required package: carData
scatterplotMatrix(wine[,2:14])
scatterplotMatrix(wine[,2:14],groups=wine$Cultivar)

Compared to simpler approaches, scatterplotMatrix() provides:

Pairwise scatterplots for all variable combinations
Histograms or density plots on the diagonal
Optional group colouring
Smoother and more informative visualisations

For the interpretation of the plots, you can:

Look for linear trends → indication of correlation
Identify clusters or group separation
Detect outliers or unusual observations
Assess whether variables show redundancy

A similar type of visualisation can be obtained using the pairs.panels() function from the psych package. It provides an enhanced scatterplot matrix that combines several types of information into one display.

library(psych)
#> Warning: package 'psych' was built under R version 4.5.3
#> 
#> Attaching package: 'psych'
#> The following object is masked from 'package:car':
#> 
#>     logit
#> The following objects are masked from 'package:ggplot2':
#> 
#>     %+%, alpha
pairs.panels(wine[,2:14])

This function produces a matrix with multiple layers of information:

Scatterplots (below the diagonal): Show pairwise relationships between variables
Correlation coefficients (above the diagonal): Display the strength and direction of linear relationships
Histograms or density plots (on the diagonal): Show the distribution of each variable
Optional smoothing lines: Help identify trends in the relationships

Some additional options are:

library(psych)
pairs.panels(wine[,2:14],
             method = "pearson",
             hist.col = "lightblue",
             density = TRUE,
             ellipses = TRUE)

Where:

method: type of correlation (e.g. Pearson)
density = TRUE: adds density curves
ellipses = TRUE: adds correlation ellipses
hist.col: colour of histograms

For the interpretation:

High correlations (values near +/-1) indicate strong relationships
Elliptical shapes in scatterplots suggest linear association
Wide, circular clouds indicate weak or no relationship
Histograms reveal skewness or unusual distributions

pairs.panels() is a powerful exploratory tool because it combines scatterplots, correlations, and distributions into a single figure. This makes it particularly useful for quickly assessing the overall structure of multivariate data before applying formal methods such as PCA or factor analysis.

Correlation plots / heatmaps

A correlation plot is a compact way to visualise the correlation matrix and quickly assess relationships between many variables at once. The corrplot package provides a flexible function, corrplot(), with several display options, controlled by the method argument.

Color-based plot (heatmap)

library(corrplot)
#> Warning: package 'corrplot' was built under R version 4.5.3
#> corrplot 0.95 loaded
corrplot(cor(wine[,2:14]),method='color')

Colours indicate strength and direction
Typically: blue = positive, red = negative
Intensity reflects magnitude

Circle plot

corrplot(cor(wine[,2:14]),method='circle')

Size of circles represents correlation strength
Colour still indicates direction

corrplot(cor(wine[,2:14]),method='number')

Displays the exact correlation values
Useful when precise values are needed

corrplot(cor(wine[,2:14]),method='shade')

Uses shading patterns to indicate correlations
Less commonly used, but visually distinct

You can improve readability with extra arguments:

corrplot(cor(wine[,2:14]),method='color', type='upper',addCoef.col='black',
         tl.col='black',tl.srt=45,number.cex=0.8,number.digits=1)

type = “upper” → show only half the matrix
addCoef.col: overlay correlation values
tl.col: text label colour
tl.srt: rotate labels
number.cex: size of the numbers
number.digits: number of decimal places

The corrplot() function provides a flexible and visually intuitive way to explore correlation structures. By switching between methods such as color, circle, and number, you can emphasize either visual patterns or exact values, depending on the goal of the analysis.

In addition to functions such as corrplot(), correlation matrices can also be visualized using ggplot2, which offers greater flexibility and customization. This approach involves reshaping the correlation matrix into a tidy format and then displaying it as a heatmap.

First, the correlation matrix is computed and converted into a long (tidy) format:

library(tidyverse)
#> ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
#> ✔ dplyr     1.1.4     ✔ readr     2.1.5
#> ✔ forcats   1.0.0     ✔ stringr   1.5.1
#> ✔ lubridate 1.9.4     ✔ tibble    3.3.0
#> ✔ purrr     1.1.0     ✔ tidyr     1.3.1
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ psych::%+%()    masks ggplot2::%+%()
#> ✖ psych::alpha()  masks ggplot2::alpha()
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()
#> ✖ dplyr::recode() masks car::recode()
#> ✖ purrr::some()   masks car::some()
#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
wine_cormat <- cor(wine[,2:14]) %>%
  as.data.frame() %>%
  rownames_to_column() %>%
  pivot_longer(-rowname)

This transformation results in a dataset where each row represents a pair of variables and their corresponding correlation value.

The correlation heatmap can then be constructed using ggplot2:

wine_cormat %>%
  ggplot(aes(x=rowname,y=name,fill=value))+
  geom_tile() +
  geom_text(aes(label=round(value,2)),color="white") +
  scale_fill_gradient2(low = "red",
                       high = "darkgreen",
                       mid="white",
                       midpoint=0,
                       limit=c(-1,1),
                       name="pearson\nCorrelation")

Interpretation:

Each square represents the correlation between two variables
The color indicates the strength and direction of the relationship:
- Red: strong negative correlation
- White: little or no correlation
- Dark green: strong positive correlation
The numbers inside the squares show the exact correlation values

3D scatterplots

In some cases, relationships between variables may not be fully captured in two dimensions. A 3D scatterplot allows us to visualise the relationship between three variables simultaneously. In R, this can be done using the scatterplot3d package.

library(scatterplot3d)
#> Warning: package 'scatterplot3d' was built under R version 4.5.2
scatterplot3d(wine$Magnesium, wine$Flavanoids, wine$Hue, 
              main="3D Scatterplot",
              xlab="Magnesium", ylab="Flavanoids", zlab="Hue",
              pch=16, color= wine$Cultivar)

In a 3D scatterplot, each axis represents one variable, each point is an observation and the spatial arrangement shows how the three variables relate. The colors of the points indicate group membership (Cultivar).

Clusters of points may indicate group structure, while separation between colors suggests that groups differ across variables. Overlapping points indicate weaker separation, and patterns (e.g. planes or trends) may suggest relationships between variables.

Static 3D plots can be harder to interpret due to perspective as overlapping points may obscure patterns. Rotating the plot interactively is often helpful (this is not available in basic scatterplot3d). Scatterplot3d provides a simple way to extend scatterplots into three dimensions, allowing for a richer visual exploration of relationships between variables. However, it is often best used alongside 2D visualisations for clearer interpretation.

Graphical Displays - Exploratory Analysis

dr. Annelies Agten

2026-04-27

Read data

Pairise scatterplots

Correlation plots / heatmaps

3D scatterplots