Setup

First, ensure all necessary packages are installed and loaded. We’ll need ggplot2 for plotting and gapminder for the dataset.

Introduction to ggplot2

ggplot2 is a powerful and a flexible R package, designed under the philosophy of tidy data, that makes it easy to create complex visualizations from data in a data frame. The syntax is somewhat different from traditional plotting functions in R (like plot()). ggplot2 uses a consistent set of principles, which makes it easy to learn, even if you’re new to it.

Understanding Basic Components of ggplot2

Every ggplot2 plot begins with the ggplot() function, which initializes a plot object. The first argument of ggplot() is usually a data frame, and the aes() function is used to define mappings between data variables and visual properties.

ggplot2’s Grammar:

The ggplot grammar, as implemented in the ggplot2 package in R, is a coherent system for describing and building graphs. It is based on the principle that any statistical graphic can be expressed by mapping data to aesthetic attributes (like color, shape, and size) of geometric objects (like points, lines, and bars) in a structured way, allowing for complex and customizable visualizations that are both informative and aesthetically pleasing. This systematized approach enables clear, concise, and consistent representation of data through graphics.

  • Data: The dataset being visualized, typically a data frame where each column is a variable and each row is an observation.
  • Aesthetics (aes): Defines mappings between data and visual properties (e.g., x and y positions, colors, shapes, sizes).
  • Geometric Objects (geoms): The visual representations of data points (e.g., points, lines, bars).
  • Facets: Enable the creation of subplots based on categorical variables (facet_wrap, facet_grid).
  • Statistical Transformations (stats): Statistical summaries for plotting (e.g., binning data, calculating regression lines).
  • Coordinate Systems: Define the graph’s space organization (e.g., coord_cartesian, coord_polar, coord_flip).
  • Themes: Control the non-data ink of a plot, including backgrounds, grid lines, and axis texts (e.g., theme, theme_minimal()).
  • Labels: Include plot titles, axis labels, and legends for plot comprehension.

This list provides a concise overview of the essential components of ggplot2’s grammar of graphics, ready for inclusion in documentation, presentations, or educational materials. ## Quick dataset exploration

This list provides a concise overview of the essential components of ggplot2’s grammar of graphics, ready for inclusion in documentation, presentations, or educational materials. ## Quick dataset exploration

Quick data exploration

Let’s see the first rows:

## # A tibble: 6 × 6
##   country     continent  year lifeExp      pop gdpPercap
##   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
## 1 Afghanistan Asia       1952    28.8  8425333      779.
## 2 Afghanistan Asia       1957    30.3  9240934      821.
## 3 Afghanistan Asia       1962    32.0 10267083      853.
## 4 Afghanistan Asia       1967    34.0 11537966      836.
## 5 Afghanistan Asia       1972    36.1 13079460      740.
## 6 Afghanistan Asia       1977    38.4 14880372      786.

We can also check the column data types and dataset dimensions:

## Rows: 1,704
## Columns: 6
## $ country   <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", …
## $ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, …
## $ year      <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, …
## $ lifeExp   <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.854, 40.8…
## $ pop       <int> 8425333, 9240934, 10267083, 11537966, 13079460, 14880372, 12…
## $ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 786.1134, …

Example 1: Scatter Plot

We will create a basic scatter plot of GDP per capita vs life expectancy for the year 2007:

Example 2: Multi-Scatter Plot

Let’s plot now the relationship between GDP per capita and life expectancy over the years grouping by continents:

Change the X-axis to logarithmic scale to improve visualization (useful if your range of values is very wide)

Example 3: Customizing plots

ggplot2 allows for extensive customization to make your plots communicate more effectively. Here’s how to customize the appearance of a plot:

Example 4: Line Plot

Next, we will visualize the change in GDP per capita over time for China.

Example 5: Multiline Plot

Next, we’ll create a line plot to observe the trend of life expectancy over years for each country in Asia.

Example 6: Bar Plot

Bar plots are useful for comparing quantities corresponding to different groups. Here we plot the average life expectancy per continent. We first use dplyr to calculate the mean before feeding the data to ggplot

Example 7: Time Series

Finally, we will create a time series plot showing life expectancy over time in India.

Here’s how to visualize the GDP per capita over time for a specific country, say China:

Example 8: Adding Facets

Facets in ggplot2 are used to split data into multiple small plots based on the values of one or more categorical variables, allowing for easy comparison across groups.

Example 9: Saving a plot

Let’s recover the plot example3_plot from Example 3 and save it in different formats:

Saving as PDF

If you write the file extension as ‘.pdf’, ggplot will export as PDF

Saving as PNG

In the same way, we can save it as PNG image by changing the file extension. The same happens with other formats (jpg, tiff, etc.)

Conclusion

This session provided a brief introduction to the ggplot2 package and demonstrated how to use it to create various types of plots. Practice these examples and try modifying the aesthetics and other parameters.